CRISP-DM: Towards a Standard Process Model for Data Mining

نویسندگان

  • Rüdiger Wirth
  • Jochen Hipp
چکیده

The CRISP-DM (CRoss Industry Standard Process for Data Mining) project proposed a comprehensive process model for carrying out data mining projects. The process model is independent of both the industry sector and the technology used. In this paper we argue in favor of a standard process model for data mining and report some experiences with the CRISP-DM process model in practice. We applied and tested the CRISP-DM methodology in a response modeling application project. The final goal of the project was to specify a process which can be reliably and efficiently repeated by different people and adapted to different situations. The initial projects were performed by experienced data mining people; future projects are to be performed by people with lower technical skills and with very little time to experiment with different approaches. It turned out, that the CRISP-DM methodology with its distinction of generic and specialized process models provides both the structure and the flexibility necessary to suit the needs of both groups. The generic CRISP-DM process model is useful for planning, communication within and outside the project team, and documentation. The generic check-lists are helpful even for experienced people. The generic process model provides an excellent foundation for developing a specialized process model which prescribes the steps to be taken in detail and which gives practical advice for all these steps.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A cost model to estimate the effort of data mining projects (DMCoMo)

CRISP-DM is the standard to develop Data Mining projects. CRISP-DM proposes processes and tasks that you have to carry out to develop a Data Mining project. A task proposed by CRISP-DM is the cost estimation of the Data

متن کامل

CASP-DM: Context Aware Standard Process for Data Mining

We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs.

متن کامل

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Data Mining is a research line that began in 1980 in order to find the knowledge that is hidden in the data that organizations are storing in a daily basis. This knowledge supports the decision-making processes in organizations. As a consequence companies of every kind have been developing data mining projects since the term appeared. However, there is no way to estimate this kind of projects. ...

متن کامل

The Search for Gold Nuggets Using CRISP-DM Without a Seasoned Miner

The rise of data mining has brought many changes to people’s lives but also to companies and the importance of data analysis. Companies always had a tendency to gather as much data as possible but it has only been recently due to the developments in IT that large quantities of data can be analyzed in a fast and easy way. This new field gave rise to the methodology of Cross Industry Standard Pro...

متن کامل

Adapting CRISP-DM Process for Social Network Analytics: Application to Healthcare

One of the key limitations about research involving big data is the lack of a sound methodological process that drives the conceptual and analytical questions posed to the data. In this study, we adapt the popular CRISP-DM process to analyze large volumes of unstructured data to generate analytical insights. We add specificity to the CRISP-DM methodology. Specifically, we propose “Cross Industr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000